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Abstract 


We  apply  Multiple  Model  Adaptive  Estimation  (MMAE),  a  proven  method  of  system 
identification  widely  used  in  engineering  applications,  to  the  problem  of  determining  Bayesian 
probability  distributions  of  the  final  cost  and  completion  time  of  on-going  Research  and 
Development  (R&D)  programs,  conditioned  on  actual  cost  of  work  performed  ( ACWP)  data. 
Modeling  cumulative  expenditures  with  Rayleigh  distributions,  we  produce  graphs  of  the  results 
that  give  useful  assessments  of  cost  and  schedule  risks.  The  procedure  is  implemented  in  a 
convenient  spreadsheet.  We  give  three  examples  of  its  application  to  actual  data,  and  results  of  a 
Monte  Carlo  analysis  verify  the  method. 


1.  Introduction 

Estimates  of  cost  and  duration  for  Research  and  Development  (R&D)  programs  often 
increase  significantly  during  the  project.  Development  costs  of  the  Concorde  aircraft  exceeded 
original  estimates  by  more  than  a  factor  of  five.  In  defense  acquisition,  where  development 
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programs  for  major  weapon  systems— aircraft,  tanks,  missiles— often  cost  billions  of  dollars,  some 
development  programs'  final  costs  and  completion  times  have  been  twice  the  original  projections. 

R&D  programs  typically  undergo  periodic  reviews,  at  which  estimates  of  the  cost-to-go 
are  critical  data  for  decisions  on  whether  or  not  to  continue.  At  such  reviews,  point  estimates  of 
final  cost  and  completion  times  are  not  particularly  helpful  to  management  because  of  their 
uncertainty.  Even  a  firm  fixed-cost  development  contract  does  not  guarantee  a  total  final  cost 
since  requests  for  equitable  adjustment  often  add  substantially  to  the  costs  of  a  program. 

At  an  intermediate  review,  management  needs  quantitative  estimates  of  the  cost  and 
schedule  risks  of  continuing  the  program.  They  need  estimates  of  the  probability  distribution  of 
final  cost  and  completion  times,  conditioned  on  present  knowledge,  for  example  on  expenditures 
to  date.  Knowing  that  available  information  indicates  that  final  cost  and  completion  times  are 
likely  to  fall  in  relatively  narrow  intervals-or,  conversely,  that  sets  of  costs  and  completion  times 
occupying  relatively  broad  intervals  are  all  about  equally  likely— can  greatly  benefit  decision 
making. 

In  this  paper,  we  develop  a  method  for  determining  Bayesian  probability  distributions  of 
final  cost  and  completion  times  of  R&D  programs,  from  data  on  incurred  costs  (specifically, 
from  the  actual  cost  of  work  performed  (ACWP)  data  provided  in  cost  performance  reports).  A 
spreadsheet  that  is  convenient  for  use  on  microcomputers  implements  the  algorithm.  With  this 
tool,  management  may  easily  access  the  cost  and  schedule  risks  inherent  in  continuing  a  R&D 
program. 

The  method  that  we  apply.  Multiple  Model  Adaptive  Estimation  (MMAE)  [16,17],  is 
widely  used  by  scientists  and  engineers  dealing  with  electronic  and  mechanical  systems.  MMAE 


is.  a  method  for  system  identification,  which  is  identifying  the  unknown  properties  of  a  system 
from  observations  to  predict  the  system’s  future  behavior.  System  identification  is  an  extensively 
developed  part  of  mathematical  system  theory.  Since  many  tasks  in  cost  analysis  are  system 
identification  tasks,  it  seems  helpful  to  apply  that  knowledge  to  them. 

MMAE  requires  a  model  of  the  system  studied,  and  in  this  paper  we  use  the  Rayleigh 
probability  model  for  the  time-history  of  expenditures  in  an  R&D  program.  Several  cost  analysts 
studied  the  applicability  of  that  model  [1,2,6,7,10,1 1,12,20,21],  and  concluded  that  it  represents 
R&D  phases  of  major  defense  acquisition  programs  well. 

MMAE  involves  the  use  of  Kalman  filters  to  estimate  the  state  of  a  system,  given  noisy 
observations.  A  system’s  “state”  is  a  set  of  parameters  that  describe  its  configuration  fully,  and 
determine  its  future  evolution  (given  future  inputs).  For  example,  in  Newtonian  mechanics  the 
state  of  a  mass  point  is  a  set  of  three  position  coordinates  and  three  velocity  coordinates.  In  this 
paper,  we  define  the  state  of  a  development  project  as  its  earned  value,  measured  by  ACWP. 

The  Kalman  filter  [13,14,15]  uses  a  model  of  the  system  to  project  the  Bayesian 
probability  density  of  its  state,  conditioned  on  a  set  of  noisy  observations.  The  Kalman  filter 
results  are  optimal  for  linear  system  models,  Gaussian  noises,  and  natural  definitions  of 
“optimal.”  The  filter  computations  proceed  iteratively  and  are  computationally  tractable. 

Our  application  of  MMAE  determines  the  likelihood  of  various  values  of  the  two 
parameters  of  a  Rayleigh  model,  based  on  the  residuals  from  a  set  of  Kalman  filters.  This  allows 
us  to  produce  graphs  of  the  probability  that  the  final  cost  or  the  completion  time  will  not  exceed 
any  particular  value.  These  graphs  give  managers  a  clear  indication  of  the  cost  and  schedule  risk 
in  continuing  a  development  program. 
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We  discuss  the  Rayleigh  model  and  its  applicability  to  R&D  programs  in  Section  2. 
Section  3  presents  the  development  of  a  dynamics  model  for  earned  value  over  time.  We 
describe  Kalman  filters  and  MMAE  as  used  in  this  application  in  Sections  4  and  5,  respectively. 
In  Section  6,  we  summarize  the  steps  in  the  method.  Section  7  contains  sample  applications, 
and  a  Monte  Carlo  analysis  of  the  proposed  technique  is  presented  in  Section  8.  The  paper 
concludes  with  a  summary. 


2.  The  Rayleigh  Model 

Norden  proposed  that  the  Rayleigh  distribution  function  can  model  expenditures  for  R&D 
programs  [18].  He  stated: 

that  there  are  regular  patterns  of  manpower  buildup  and  phase-out  in 
complex  projects. ...  The  cycles  do  not  depend  on  the  nature  or  work 
content  of  the  project  but  seem  to  be  a  function  of  the  way  groups  of 
engineers  and  scientists  tackle  complex  technological  development 
problems. 

Norden  derived  the  relationship  based  on  the  assumption  that  the  effectiveness  which  problems 

are  solved  improves  as  a  linear  function  of  time.  "Norden's  description  of  the  process  is  this:  The 

rate  of  accomplishment  is  proportional  to  the  pace  of  the  work  times  the  amount  of  work 

remaining  to  be  done."  [19]  Putnam  summarized  testing  of  the  Rayleigh  model  on  estimating 

manpower  for  over  200  software  development  projects  as  follows: 

Many  of  these  also  exhibit  the  same  basic  manpower  pattern— a  rise, 
peaking,  and  exponential  tail  off  as  a  function  of  time.  Not  all  systems 
follow  this  pattern. ...  It  is  because  manpower  is  applied  and  controlled  by 
management.  Management  may  choose  to  apply  it  in  a  manner  that  is 
suboptimal  or  contrary  to  system  requirements.  [20] 
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Within  the  Department  of  Defense,  weapon  system  R&D  expenditures  often  follow  a 
Rayleigh  cumulative  distribution  function  [1,2,6,7,10,11,12,20,21].  Watkins  [21],  Abemethy 
[1],  Lee,  Hogue  and  Hoffman  [12],  and  Elrod  [2]  tested  the  ability  of  the  Rayleigh  model  to  fit 
actual  weapon  system  R&D  data.  They  all  conclude  that  the  Rayleigh  model  fit  well.  Lee, 
Hogue,  and  Gallagher  [10, 1 1]  presented  a  procedure,  based  on  the  Rayleigh  model,  to  determine 
budget  profiles  from  an  R&D  estimate. 

The  Rayleigh  model  for  cumulative  earned  value  during  R&D  is 

v(r)  =  d[l -Qxip(-at^  )J  (1) 

where  v  represents  the  earned  value  at  time  t.  In  this  paper  we  model  earned  value  by 

expenditures  (as  reported  by  ACWP)  expressed  in  constant  dollars.  The  parameter  d  scales  the 

Rayleigh  cumulative  distribution  function  to  costs,  and  the  shape  parameter,  a,  determines  the 

time  of  peak  rate  of  expenditures,  tp. 


Since  the  Rayleigh  distribution  function  has  an  infinite  tail,  the  modeled  expenditures 
would  never  terminate.  We  define  the  time  of  final  development,  tf ,  as  when  97  percent  of  the 
expenditures  are  complete; 

D=v{tf)  =  091d  (3) 

where  D  is  the  total  R&D  program  cost.  The  final  time  relates  to  the  time  of  peak  rate  of 
expenditures  with  =  2.65tp  [1 1].  In  addition,  the  Rayleigh  shape  parameter  a  can  be 

determined  from  a  projection  of  the  completion  time  with 
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(4) 


3^ 


We  employ  the  Rayleigh  model  to  predict  the  change  in  earned  value  as  time  passes. 

3.  Earned  Value  Over  Time 

A  generalized  model  that  embraces  both  Rayleigh  and  Parr  [19]  models  is 

—  =  F{v)  (5) 

dt 

where  v  is  earned  value  and  F(v)  gives  the  rate  at  which  the  project  absorbs  resources  efficiently. 
The  function  F(v)  is  like  Parr's  "number  of  visible  jobs"  to  which  effort  can  efficiently  be 
applied.  The  function  F(v)  must  satisfy  some  common-sense  conditions:  F{v)  must  be  positive, 
except  that  it  is  zero  at  v  =  v(tf ),  the  final  value  of  the  project,  and,  possibly,  also  at  v  =  v(to),  the 
project  start.  F(v)  must  be  increasing  in  some  neighborhood  of  v  =  v(to)  and  decreasing  in  some 
neighborhood  of  v  =  vf tf  ). 

If  F(v)  is  also  continuous,  (5)  is  uniquely  soluble  in  the  form 

P(v)  =  t 

where  the  continuously  differentiable  function  P(v)  satisfies  dP/dF  =  1/F  with  initial  condition 
P(0)  =  0 .  By  the  positivity  of  F,  P  is  monotone  increasing,  so  the  inverse  function  P'^  exists,  and 

V  = 

This  formulation  is  a  generalization  of  the  Rayleigh  case  shown  in  (1).  A  straightforward 
calculus  exercise  using  (1)  shows  that  the  P(v)  corresponding  to  Rayleigh  is 
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P(v)  = 


-7 

f, 

—  In 

7-- 

.a 

1  «/. 

and  the  F(v)  for  the  Rayleigh  case  is 


f 

-1  . 

f,  01 
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Solving  (5)  with  initial  conditions  of  v(ti)  =  v,  for  the  Rayleigh  case,  one  gets 


=  d 


v(t)  =  FUt-ti+P(vi)) 

f  .  , - - 


i  -  exd 


-a 


+  .  —  ln| 

\a 
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V 


(6) 


for  r  >  r, .  We  apply  the  Rayleigh  model  as  employed  in  (6)  to  predict  the  earned  value  at  a  future 
time  given  an  earlier  estimate  of  the  earned  value.  Equation  (6)  is  the  dynamics  model  that 
propagates  state  estimates  (means  of  the  Bayesian  probability  distribution  functions)  for  earned 
value  through  time  in  the  Kalman  filter  formulation. 


4.  Kalman  Filter 

The  Kalman  filter  is  an  iterative  Bayesian  state  estimation  technique.  (Maybeck  presents 
a  thorough  discussion  in  [15].)  The  state  is  the  random  variable  of  interest;  in  this  application  to 
R&D  programs,  the  state  is  the  earned  value  and  the  measurements  are  the  reports  of  actual  costs 
incurred.  The  first  stage  of  the  Kalman  filter  propagates  the  state  distribution  through  time  based 
on  a  dynamics  model.  The  second  stage  updates  the  distribution  with  the  information  from  an 
actual  measurement  of  the  system.  The  Kalman  filter  algorithm  repeats  these  two  steps  for  each 
available  measurement.  This  section  develops  the  propagation  and  update  stages  of  a  Kalman 
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filter.  In  this  section,  we  assume  the  three  parameters  required  in  a  Kalman  filter  exist,  in  the 
next  section,  we  apply  Multiple  Model  Adaptive  Estimation  (MMAE)  [16,17],  another  Bayesian 
technique  that  uses  many  Kalman  filters  each  with  a  different  combination  of  assumed 
parameters,  to  evaluate  the  likelihood  of  various  parameter  values. 

For  this  application,  we  define  the  Kalman  filter  state,  x(ti),  as  the  cumulative  earned 
value  (expenditures  expressed  in  constant  dollars)  at  time  t,;  thus,  x(ti)  =  v(t, ) .  We  indicate  the 
means  of  Bayesian  probability  distributions  for  the  Kalman  filter  state  by  a  hat.  At  the  time  of 
each  measurement,  the  Kalman  filter  algorithm  calculates  two  state  distribution  means.  A 
superscript  minus  sign  indicates  the  distribution  mean  prior  to  incorporating  the  measurement 
update,  Jc(tr).  Similarly,  a  superscript  plus  signs  indicates  the  distribution  mean  updated  with 

the  information  from  a  measurement  at  time  x(t* ) . 

The  steps  in  a  Kalman  filter  iterate  between  propagation  of  the  distribution  mean  through 
time  and  measurement  update  of  the  distribution  mean.  The  state  propagation  is  determined  for 


the  Rayleigh  model  in  (1)  with  (6)  as 


x(t])  =  d 


^  ( 

1  -  exp  -a 


H  th) 


V 


V 


(7) 


V  ‘*7, 

The  appropriate  initial  state  distribution  mean  in  this  application  is  zero  because  no  expenditures 
can  be  incurred  before  the  beginning  of  the  program;  x(t^)  =  0 . 

The  measurement  update  step  incorporates  the  new  information  from  a  measurement. 
The  notation  for  the  measurement  at  time  t,  is  z,.  In  this  application,  the  measurement  is  the 
value  of  ACWP  reported  in  the  cost  performance  reports,  adjusted  for  inflation.  Since  the 
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measurement  is  a  direct  measure  of  the  state,  the  Kalman  filter  residual  is  the  difference  between 
the  measurement  and  the  mean  of  the  state  distribution  prior  to  incorporating  the  measurement. 


n  =  Zi  -  x(t]).  (8) 

The  Kalman  filter  gain,  k,  weights  the  information  provided  by  the  dynamics  model  along  with 
the  prior  measurements  and  the  information  provided  by  the  new  measurement.  Thus,  the 
Kalman  filter  algorithm  calculates  the  updated  state  distribution  mean  with 


(9) 


x(tt)  =  x(t])+  kr. 

=  (l-k)x(t])+  kzi. 

Since  the  Kalman  filter  gain  provides  the  relative  weighting  of  two  pieces  of  information 
about  the  system  available  at  the  time,  the  gain  is  bounded  between  zero  and  one;  0<k<l.  If 
the  gain  is  zero,  the  update  distribution  mean  is  based  entirely  on  the  dynamics  model;  whereas  if 
the  gain  is  one,  the  updated  mean  is  the  last  measurement.  With  values  for  d,  a,  and  k,  one  can 
apply  a  Kalman  filter  using  (7),  (8)  and  (9)  iteratively  for  each  available  measurement  (reported 
actual  cost).  The  next  section  presents  a  development  for  Bayesian  estimation  of  these  three 
parameters. 


5.  Multiple  Model  Adaptive  Estimation  (MMAE) 

MMAE  is  a  Bayesian  system  identification  technique  that  estimates  unknown  system 
parameters  when  applying  Kalman  filters  [16,17].  In  this  application,  we  use  MMAE  to 
determine  the  likelihood  for  the  parameters  d  (cost  scale  parameter),  a  (Rayleigh  shape 
parameter),  and  k  (Kalman  filter  gain).  The  advantage  of  applying  MMAE  is  that  the 
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probabUities  are  conditional  on  the  actual  cost  data,  which  prevents  assigning  probabilities  to 
final  costs  below  the  incurred  cost  or  completion  times  less  than  the  elapsed  duration. 

An  overview  of  the  algorithm  follows:  The  set-up  for  employing  MMAE  is  to  discretize 
the  continuous  space  for  each  parameter  into  a  set  of  representative  points.  The  MMAE 
algorithm  processes  the  measurements  (reported  actual  costs  in  this  application)  through  a 
Kalman  filter  at  each  combination  of  discrete  parameters.  Each  filter’s  residuals  detenmne  the 
probability  of  that  filter's  parameters  being  correct,  conditioned  on  the  measurements  processed 
to  that  time.  After  processing  all  the  available  measurements,  the  filter  probabilities  indicate  the 
likelihood  of  the  parameters  in  that  filter  being  correct  conditioned  on  the  measurements.  We 
relate  the  filter  parameter  d  to  total  program  cost  with  (3)  and  the  filter  parameter  a  to  project 
duration  with  the  relationship  in  (4).  We  incrementally  add  the  final  filter  probabilities  as  the 
filter  parameters  for  d  increase  to  generate  curves  depicting  the  cumulative  probability  of  the 
final  cost  being  less  than  any  particular  value.  Similarly,  we  incrementally  sum  the  filter 
probabilities  as  the  values  for  a  increase  to  determine  a  likelihood  curve  for  project  duration. 

The  details  of  the  algorithm  begin  with  the  set-up  for  applying  MMAE,  discretizing  the 
parameter  space.  Define  the  number  of  Kalman  filters  as  L.  Let  fl;  represent  the  vector  of 
parameters  di  ,0,1 ,  and  ki  selected  for  the  /th  filter,  where  1  =  1, ...,  L.  With  a  vector  of 
parameters  0,1 ,  one  can  processed  the  data  through  a  Kalman  filter  by  iteratively  applying  (7),  (8) 
and  (9).  In  the  examples  and  Monte  Carlo  analysis,  we  used  20  values  for  d,  20  values  for  a,  and 
5  values  for  k,  equally  spaced  in  each  dimension.  Thus,  we  processed  the  reported  cost  data 
through  2,000  Kalman  filters. 
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Our  approach  discretized  the  parameter  space  in  two  steps.  The  first  step  is  processing 
the  measurement  data  through  filters  with  a  coarse  discretization,  and  the  second  step  is  refining 
the  discretization  based  on  the  filters’  sum  of  squared  residuals.  Let  the  measurement  history  be 
represented  by  =  {z, ,  Zj >  •  •  • » }  where  z,  is  the  cumulative  cost  incurred  at  time  index  r,. . 

We  determine  the  range  of  the  Rayleigh  parameter  a  from  estimates  of  the  minimum  and 
maximum  completion  time  with  (4),  and  we  varied  the  values  of  oc  incrementally  over  the  range. 
The  default  range  for  estimated  completion  times  is  from  a  minimum  of  the  last  cost  report,  , 
to  an  arbitrary  maximum  time  of  15  years.  For  example,  if  the  maximum  completion  time  is  15 
35 

years,  actually  the  smallest  shape  parameter.)  Our 

algorithm  sets  the  minimum  value  for  the  cost  scale  parameter  equal  to  the  last  reported  cost, 
d^=  Zf,,  and  sets  the  maximum  value  equal  with  the  amount  and  time  of  the  last  cost  report  with 
the  Rayleigh  curve  for  the  longest  program, 

d„=- — ^ — i--  (1®) 

The  Kalman  filter  gain  ranges  from  0  to  1.  An  analyst  may  adjust  either  the  cost  parameter  or 
completion  time  ranges.  The  algorithm  processes  the  cost  data  through  each  of  the  Kalman 
filters  with  this  initial  coarse  discretization  of  the  parameter  space.  We  use  this  first  pass  through 
the  data  to  estimate  the  residual  variance  and  to  refine  the  parameter  discretization. 

MMAE  determines  the  filters'  probabilities  by  the  magnitude  of  that  filter's  residuals.  The 
Kalman  filter  residuals  for  linear  systems  with  known  structural  matrices  and  driven  by  white 
noise  are  independent  and  Gaussian  distributed  with  zero  mean  and  known  variance  [15]. 
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Although  these  assumptions  are  not  met  in  this  case,  other  applications  assumed  that  the 
residuals  are  Gaussian  and  obtained  useful  results  [3, 4, 5, 8, 9].  We  assumed  that  the  residuals 
calculated  with  (8)  are  zero  mean  with  a  variance  estimated  from  the  Kalman  filter  with  the 
smallest  sum  of  squared  residuals  from  an  initial  pass  through  the  data; 

r  ,  1  (11) 

After  the  first  pass  of  the  data  through  the  bank  of  Kalman  filters,  we  reduce  the 
parameter  range  to  eliminate  parameter  values  that  resulted  in  sum  of  squared  residuals  three 
times  the  minimum  value,  .  Our  algorithm  equally  spaces  the  parameters  for  the  Kalman 
filters  across  the  reduced  parameter  ranges.  The  algorithm  calculates  the  MMAE  probabilities  on 
the  second  pass  of  the  data  through  the  Kalman  filters.  Based  on  the  assumption  of  zero  mean 
and  the  residual  variance  estimated  in  (1 1),  the  Gaussian  probability  density  function  for  the  ith 
measurement,  z,- ,  conditioned  on  the  Zth  filter's  vector  of  parameters,  fl/,  and  the  prior 
measurement  history,  Z,.i ,  is 


f(zi^ai,Zi.i)  = 


exp 


2s^r 


as  adapted  from  Equation  (10-98)  in  Reference  [16].  The  probability  for  the 7th  filter  having  the 
"correct"  parameters  conditioned  on  the  measurement  history  through  time  r,-  is 

f(Zi\aj,Zi.i)Pj(ti.i\Zi.i)  (12) 

Pj(u  Zi) 

from  Equation  (10-104)  in  Reference  [16].  The  probabilities  at  each  measurement  time,  ti  for 
I  =  1, ...,  N,  must  sum  to  one; 
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(13) 


iPlUi)  =  1- 

The  initial  or  a  priori  probabilities  account  for  information  available  about  the  likelihood 
of  particular  filter  combinations  before  the  measurement  data  are  processed.  If  no  information  is 
available,  the  a  priori  probabilities  should  all  be  equal;  pi(to)  =  1>^  for  /  =  1, L.  In  addition,  if 
any  of  the  filter  probabilities  became  zero,  that  filter's  probabilities,  calculated  with  (12),  would 
remain  zero  for  all  the  later  times.  To  prevent  prematurely  discarding  potentially  viable  filter 
parameters,  practitioners  commonly  apply  a  heuristic  [16,17];  if  any  of  the  filter  probabilities 
decreases  below  a  very  small  lower  bound,  such  as  0.0001,  the  heuristic  artificially  increases  that 
filter’s  probability  to  the  lower  bound.  The  filter  probabilities  that  result  after  the  last  datum  are 
not  adjusted  with  this  heuristic.  The  final  filter  probabilities  represent  the  likelihood  of  each 
combination  of  model  parameters  conditioned  on  the  available  measurement  history,  Zn- 

We  use  the  filter  probabilities  to  determine  estimates  for  the  final  cost  and  completion 
time.  The  final  cost  corresponding  to  the  parameters  di  and  a,  is  D,  =  ;  the  translation 

factor,  T  ,  is  0.97  from  (3)  for  Di  expressed  in  constant  dollars.  To  express  Di  in  current  dollars, 

the  translation  factor  must  account  for  inflation  during  the  program.  Let  the  sequence  of  start  of 
fiscal  years  during  the  program  duration  be  represented  ^  with  ^  being  the  program  start  and  ^ 

be  the  projected  program  end.  Further,  let  the  corresponding  inflation  indices  for  the  following 
fiscal  year  be  .  Then  the  translation  factor  corresponding  to  a,  is  7^  =  -c'”''*'  )•  Each  Di 

should  be  constrained  to  be  greater  than  or  equal  to  last  cost  report,  expressed  appropriately  in 
constant  or  current  dollars.  The  mean  estimate  of  final  cost  conditioned  on  the  available 
measurement  history,  Zn,  is  calculated  with 
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D  =  'LiiDiPi{tN\ZN) 

where  is  the  time  index  corresponding  to  the  last  available  cost  report,  .  Similarly,  the 
conditional  mean  estimate  for  completion  time,  based  on  the  (4),  is 


V  = 


f  ^0.5 

3^ 

a 

V  I J 


(15) 


The  program  estimates  from  (14)  and  (15)  are  the  most  likely  conditional  on  the  actual  cost  data. 

The  cumulative  probability  cost  curve  conditioned  on  the  measurement  history  shows  the 
probability  that  the  final  cost,  D,  will  be  less  than  any  dollar  value.  Let  the  cost  scale  parameters 
increase  from  di  to  dm.  The  sum  of  filter  probabilities  for  all  the  filters  with  d,  represents  the 
probability  over  the  range  [0.5(d,_,  +  d,.),  0.5(d,  +  d,+,)]  for  i  =  2, ...,  m-1.  There  is  no 

conditional  probability  below  dj  or  above  d^ .  Define  d,.  =  0.5(d,-  +  d,.+,)  for  i  =  l,...,m-l,  and 


d  =  d  .  The  final  cost  estimate  for  the  Rayleigh  model  with  parameters  d,  and  a,  is 
calculated  as  D,  =  d,  where  is  the  translator  factor  to  constant  or  current  dollars  used  in 
(14).  The  final  cost  estimate  should  exceed  the  last  datum.  We  calculate  the  cumulative 
probabilities  by  summing  the  filter  probabilities  for  filters  with  final  cost  estimates,  D, ,  less  than 


a  dummy  cost  variable  A,  with  linear  interpolation  between  cost  estimates;  with  D,  for  I  —  1  to  L 
sorted  in  increasing  magnitude: 
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P(D<XIZ^)  = 


0 


D,-D, 


^Pl(tN) 


ifX<b^ 

ifb,  <x<b^ 


y'-i  p  (f  -)  I  ^ '  A _ p  \  j//),  <  X,  <  D,^., 

^PiihJ  for2</<L-l 


(16) 


1  i/X  > 

The  graph  of  ?(D<X\Zn)  versus  X  shows  the  conditional  probability  that  the  final  cost  will  be  less 


than  the  any  particular  value.  Finer  discretization  smoothes  the  likelihood  curve.  The  constant- 
dollar  and  current-dollar  curves  have  very  similar  shapes.  We  also  generate  the  cumulative 
probability  curve  for  project  duration  using  the  parameter  a  and  the  relationship  in  (4). 

A  confounding  relationship  limits  the  ability  to  estimate  both  d  and  a  when  af^  is  small 
[12].  This  problem  can  be  seen  by  expanding  the  exponential  in  (1); 


v(t)  =  d(l  -Gxp(-at^ )  =  d 


l-(l-a  t^  + 


...) 


»  adf  +  0(do.^t*) 


where  the  function  O(-)  represents  higher  order  terms.  When  is  small,  the  higher  order  terms 
are  negligible  and  only  the  product  of  a  and  d,  but  not  their  individual  values,  can  be  estimated 
from  the  data.  The  relationship  <  0.5  holds  prior  to  the  time  of  peak  expenditure  rate,  as  seen 
from  (2).  Thus,  many  different  Rayleigh  curves  appear  to  fit  the  data  from  to  to  tp  due  to  the 
canceling  effects  of  changes  to  a  and  d.  With  an  independent  estimate  for  either  the  time  of  peak 
rate  of  expenditures  or  the  completion  time,  an  analyst  may  determine  the  parameter  oc  and 
estimate  d  using  the  data.  MMAE  has  the  same  confounding  problem  as  any  statistical  technique 
when  only  data  before  the  peak  expenditure  rate  is  available.  If  an  independent  estimate  of  a  is 
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available,  one  can  put. that  value  into  all  the  filters  and,  apply  MMAE  to  estimate  the  probability 
distribution  of  the  final  cost. 

6.  Algorithm  Steps 

While  the  development  of  the  algorithm  is  complex,  implementation  is  not  difficult.  An 
Excel  spreadsheet  with  a  Visual  Basic  Module  that  applies  this  technique  is  available  from  the 
authors.  The  runtime  on  a  486  computer  is  about  1  minute  with  50  data  points.  The  procedure 
steps  are  enumerated  below: 

Step  1)  Adjust  the  history  of  cost  reports  for  inflation 

-  Determine  the  delta  between  cumulative  cost  reports 

-  Apply  the  appropriate  inflation  index  to  the  delta 

-  Sum  the  constant  dollar  deltas  to  cumulative  base-year  costs 

-  Determine  time  indices  in  years  for  each  datum  from  the  program  start  date 
Step  2)  Determine  the  completion  time  range  (may  be  fixed  to  a  single  value) 

-  Default  range  is  from  the  time  index  of  the  last  cost  report  to  15  years  (arbitrary) 

-  Adjust  completion  time  range  based  on  program  knowledge 

-  Relate  completion  time  range  to  corresponding  a  range  with  (4) 

Step  3)  Determine  the  range  for  final  cost  estimates 

-  Default  for  minimum  value  is  last  reported  incurred  costs  (in  constant  dollars) 

-  Default  for  maximum  value  is  estimated  with  (10) 

-  Adjust  final  cost  range  based  on  program  knowledge 

-  Relate  final  cost  range  to  range  for  d  with  (3) 
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Step  4)  Initiedize  Kalman  filters 

-  Set  number  of  discrete  points  for  each  variable,  such  as  20  for  d  and  (x  with  5  for  k 

-  Determine  discrete  values  equally  spaced  across  selected  parameter  range 

-  Assign  variables  for  a  Kalman  filter  with  each  combination  of  parameter  values 

-  Set  prior  mean  of  state  distributions  to  zero  at  initial  time  index,  to;  jc(fo)  =  0 

Step  5)  Process  data  through  filters  to  estimate  residual  variance  and  adjust  parameter  ranges 

-  Propagate  state  distribution  means  with  (7) 

-  Update  state  distribution  means  with  (8)  and  (9) 

-  Collect  sum  of  squared  residuals  from  (8)  for  each  Kalman  filter 

-  Find  minimum  sum  of  squared  residuals  and  estimate  residual  variance,  s^,  with  (11) 

-  Reduce  a  and  d  ranges  to  eliminate  values  that  always  resulted  in  sum  of  squared 
residuals  greater  than  3  times  the  minimum  sum 

-  Equally  space  the  filter  parameters  across  the  reduced  parameter  ranges 

-  Reset  prior  means  and  set  filter  probabilities  p;(fo)  =  1/L  for  /  =  1, ...,  L. 

Step  6)  Process  data  values  through  bank  of  filters  to  determine  filter  probabilities 

-  Propagate  state  distribution  means  with  (7) 

-  Update  state  distribution  means  with  (8)  and  (9) 

-  Calculate  filter  probabilities  with  (12) 

-  Normalize  filter  probabilities  to  meet  (13) 

-  Except  for  last  data  point,  adjust  filter  probabilities  for  lower  bound;  p,(ti)  ^  0.0001 

Step  7)  Determine  conditional  probabilistic-weighted  averages  with  (14)  and  (15) 

Step  8)  Determine  conditional  cost  likelihood  curve  with  (16) 
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7.  Sample  Applications 

We  applied  the  Bayesian  estimation  approach  to  three  diverse  historic  programs,  the  F- 1 5 
airframe  development,  the  NavStar  Global  Positioning  System  (GPS)  Satellite,  and  the  MK  50 
Torpedo.  We  selected  these  programs  to  cover  a  variety  of  technologies  without  prior  knowledge 
of  how  well  the  Rayleigh  model  fit  them.  The  F-15  development  contract  completed  on  schedule 
with  very  slight  cost  growth.  The  satellite  program  experienced  much  higher  final  cost  than 
originally  projected.  The  MK  50  program  required  a  substantial  schedule  increase  beyond  the 
originally  projected  development  time  and  almost  twice  the  expense  of  original  cost  estimate. 

The  only  program  data  used  was  the  originally  projected  duration  and  the  actual  cost  reports.  We 
set  the  completion  time  ranges  from  the  originally  projected  length  to  twice  that  length  in  each 


application. 


The  F-15  airframe  development  contract  started  in  January  1970.  The  contract  continued 
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for  over  8  years,  but  most  all  the  earned  value  occurred  in  the  first  5  years.  The  Rayleigh  model 
fits  the  reported  expenditures  reasonably.  Figure  1  shows  the  Rayleigh  model  with  the  least 
squares  parameters  and  the  cost  reports  adjusted  for  inflation. 

We  applied  our  Bayesian  cost  estimation  approach  with  the  initial  3, 4, 5,  and  all  years  of 
F-15  airframe  expenditures.  We  set  the  completion  time  range  from  5  to  10  years.  Figure  2 
depicts  the  resulting  likelihood  curves.  The  likelihood  curve  based  on  only  3  years  of  data 
indicates  a  wide  potential  range  for  the  final  cost.  When  4  or  more  years  of  data  were  used,  the 
likelihood  curves  are  very  close  to  the  actual  final  cost. 

Most  of  the  techniques  used  today  to  predict  final  costs  of  R&D  programs  give  a  point 
estimate  for  the  final  cost.  Of  course,  the  MMAE  method  gives  much  more  than  a  point 


estimate.  Nevertheless,  to  compare  with  other  techniques,  we  had  to  select  a  point  estimate.  We 
compared  the  expected  value  from  the  current  dollar  likelihood  curve  with  four  techniques  that 
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Figure  2.  F-15  Airframe  Development  Final  Cost  Likelihood  Curves 
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predict  point  estimates  for  final  program  cost. 

For  each  of  the  four  techniques,  the  final  cost  estimate  is  actual  cost  of  work  performed 
(ACWP)  plus  the  quotient  of  work  remaining  divided  by  a  cost  performance  index  (CPI).  Work 
remaining  is  budgeted  work  minus  management  reserve  and  budgeted  cost  of  work  performed 
(BCWP).  The  four  techniques  vary  in  the  calculation  of  the  CPI.  The  index  for  cumulative  CPI 
(Cum  CPI)  is  the  cumulative  BCWP  performed  divided  by  the  cumulative  ACWP.  Similarly, 
CPI-3  and  CPI-6  are  calculated  with  BCWP  and  ACWP  over  the  last  3  or  6  months, 
respectively.  The  cumulative  CPI  times  cumulative  schedule  performance  index  (CPPSPI)  is  the 
CPI  multiplied  by  the  cumulative  budget  cost  of  work  scheduled  divided  by  the  ACWP.  We 
compare  these  four  techniques  with  the  expected  value  for  the  three  historical  programs. 

Table  1  depicts  the  various  final  cost  estimates.  The  CPI  techniques  were  low  with  the 
initial  cost  data  and  increased  over  time.  In  contrast,  the  expected  values  from  the  Rayleigh  and 
MMAE  approach  started  much  too  high  with  only  data  prior  to  peak  expenditure  rate  and 
decreased  with  additional  data. 

The  second  sample  program  is  NavStar  Global  Positioning  System  (GPS)  Satellite.  This 
R&D  program,  which  began  in  June  of  1974,  had  a  projected  completion  time  of  4.3  years  and  a 
projected  final  cost  of  40  million  in  then-year  dollars.  The  program  required  almost  6  years  and 
required  1 16.3  million  in  then-year  dollars.  The  cumulative  costs  in  constant  dollars  appear  to  fit 


Table  1.  F-15  Airframe  Contract  Final  Cost  Estimates  (Current  Dollars) 


Years  of  Data 

CUM  CPI 

CPI-3 

CPI-6 

CPFSPI 

Rayleigh/MMAE 

2 

752.4 

764.8 

784.7 

1,880.4 

3 

779.4 

775.3 

1113 

1,016.3 

4 

696.8 

689.4 

681.2 

834.6 

5 

820.9 

819.0 

816.2 

836.3 

The  program  manager  estimate  in  Mar  1978  (8.25  years)  was  850.0. 
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Figure  3.  NavStar  GPS  Satellite  Rate  of  Expenditures 

the  Rayleigh  model;  Figure  3  depicts  the  rate  of  expenditures  and  the  derivative  of  the  least 
squares  Rayleigh  model.  We  calculated  the  expenditure  rates  as  the  increase  in  reported 
cumulative  expenditures  divided  by  the  time  delta  between  cost  reports.  The  Kalman  filter  gain, 
k,  accounts  for  the  measurement  noise  in  the  cost  reports,  apparent  from  the  variation  in  reported 
expenditure  rates.  A  quick  heuristic,  based  on  the  Rayleigh  model,  to  evaluate  progress  in  R&D 
programs  is  that  60  percent  of  the  expenditures  occur  after  the  time  of  peak  expenditure  rate,  tp. 
We  applied  the  Bayesian  method  with  2,  3, 4,  and  5  years  of  expenditure  data,  and  Figure 
4  depicts  the  final  cost  likelihood  curves.  Without  data  after  the  peak  rate  of  expenditure  time, 
the  completion  time  and  final  cost  are  statistically  confounded.  Since  the  peak  rate  occurs  just 
after  2  years  in  the  NavStar  R&D,  the  likelihood  cost  curve  based  on  2  years  of  data  indicates  the 
potential  for  a  very  long  and  expensive  program.  The  level  expenditure  rate  during  the  fourth 
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Figure  4.  NavStar  GPS  Satellite  Final  Cost  Likelihood  Curves 


year,  shown  in  Figure  3,  resulted  in  the  likelihood  curves  based  on  3  and  4  years  of  data  to 
underestimate  the  final  cost.  With  5  years  of  data,  the  likelihood  curve  is  very  accurate. 

We  used  the  final  filter  probabilities  from  the  same  runs  to  generate  the  program  duration 
likelihood  curves.  The  duration  range  was  from  the  original  projection  of  4.3  years  to  8.6  years. 
The  duration  likelihood  curves,  shown  in  Figure  5,  remain  fairly  consistent  until  5  years  of  data 
was  used.  Figure  3  shows  that  the  fourth  year  of  data  had  a  higher  rate  of  expenditures  than 
predicted  with  the  Rayleigh  model;  the  likelihood  curves  conditioned  on  5  years  of  data  indicate 
an  increased  probability  of  a  longer  program.  Data  fluctuations  seems  to  affect  the  duration 
likelihood  curves  more  than  curves  for  final  cost. 

We  present  the  various  final  cost  estimates  in  Table  2.  The  CPI  techniques  were  low 
initially  and  increased  with  more  data.  In  contrast,  the  expected  values  from  the  proposed 
approach  remained  slightly  below  the  actual  final  cost. 
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Figure  5.  Navstar  GPS  Satellite  Duration  Likelihood  Curves 

The  final  example  is  the  development  of  the  MK  50  Torpedo.  This  program  began  in 


August  of  1983  with  a  5  year  projected  duration.  The  program  was  extended  an  additional  3 
years,  and  the  final  costs  increased  65  percent  higher  in  current  dollars.  The  completion  time 
range  was  set  from  5  to  10  years.  The  likelihood  cost  curves  are  depicted  in  Figure  6.  A  small 
probability  of  the  cost  being  as  high  as  the  actual  final  cost  is  seen  with  even  3  years  of  data. 
With  each  year  of  additional  data,  the  expected  value  from  the  likelihood  cost  curves  moves 
closer  to  the  actual  final  cost.  With  6  and  7  years  of  data,  much  of  the  likelihood  curves  exceed 


Table  2.  NavStar  GPS  Satellite  Final  Cost  Estimates 


Years  of  Data 

CUM  CPI 

CPI-3 

CPI-6 

CPPSPI 

Rayleigh/MMAE 

2 

70.0 

80.9 

78.7 

71.0 

109.8 

3 

99.8 

100.3 

101.7 

103.3 

96.2 

4 

104.4 

108.7 

104.3 

106.9 

98.6 

5 

114.0 

114.5 

114.4 

115.5 

112.2 

The  program  manager  estimate  in  Aug  1979  (5.25  years)  was  1 16.3. 
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Figure  6.  MK  50  Torpedo  Final  Cost  Likelihood  Curves 


the  final  cost  because  the  lower  bound  of  the  curves  is  cost  incurred  to  that  point  in  time. 

The  various  final-cost  point  estimates,  shown  in  Table  3,  increased  significantly.  All  the 
techniques  started  too  low  an  increased  as  additional  data  was  available. 

These  three  examples  demonstrate  the  capabilities  of  this  Bayesian  cost  estimation 
approach  for  on-going  R&D  programs.  In  each  of  the  applications,  the  algorithm  made  final  cost 
likelihood  curves  that  are  very  near  the  actual  final  costs  based  on  very  little  program  specific 
Table  3.  MK  50  Torpedo  Final  Cost  Estimates _ 


Years  of  Data 

CUM  CPI 

CPI-3 

CPI-6 

CPPSPI 

mmmMmm 

3 

580.7 

566.3 

589.0 

409.8 

4 

529.8 

540.1 

536.8 

527.8 

559.8 

5 

655.7 

667.1 

659.4 

650.6 

629.4 

6 

685.3 

678.8 

680.8 

685.0 

746.3 

7 

707.3 

706.9 

706.6 

708.9 

714.5 

The  program  manager  estimate  in  Dec  1990  (7.25  years)  was  71 1.4, 
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data.  Tighter  bounds  on  the  possible  range  of  final  cost  or  completion  times  based  on  additional 
program  knowledge  would  improve  the  proposed  method’s  results.  A  Monte  Carlo  analysis, 
presented  in  the  next  section,  shows  the  statistical  effectiveness  of  this  approach. 

8.  Monte  Carlo  Analysis 

We  conducted  a  Monte  Carlo  analysis  of  this  technique  with  generated  noise-corrupted 
Rayleigh  data  to  verify  its  statistical  validity.  We  evaluated  the  algorithm  estimates  for  accuracy 
of  point  estimates  and  accuracy  of  the  final  cost  likelihood  curves.  The  performance  statistics 
were  collected  after  applying  the  algorithm  with  various  amounts  of  the  generated  data. 

Various  final  costs,  completion  times,  and  noise  levels  determined  specific  cases.  We 
generated  quarterly  data  such  that  the  initial  cost  at  time  zero  was  zero,  the  cumulative  cost 
always  increased,  and  the  cost  at  completion  time  was  the  final  cost.  For  each  cost  report,  the 
generated  datum  was  calculated  with 

z,  =  v(t,)  =  4F(r,_,)-h(F(r,)-F(t,_,))(l-He )] 
such  that  v(to )  =  0 ,  v(t^  )  =  D,  and  v(r,- )  >  v(r,_, ) 
where  F(i)  =  1  — exp(-a  t^),  the  cumulative  Rayleigh  distribution  function,  d  is  from  (3),  a  is 
from  (4),  and  e  is  a  uniform  random  variable  between  plus  and  minus  the  noise  level. 

We  tested  seven  cases.  The  final  costs  used  were  2,000, 1,500  and  1,000  for  a  12  year 
program.  The  Rayleigh  shape  parameters  were  determined  with  (4),  and  the  noise  level  for  5 
cases  was  set  at  0.1.  We  varied  the  noise  level  to  0.2  and  0.3  for  the  12  year  program  with  final 
cost  of  1,000  dollars.  We  also  varied  the  completion  time  for  the  1,000  dollar  program  to  9  and 
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to  6  years.  For  each  case,  summary  statistics  were  collected  across  500  data  sets;  we  applied  the 
algorithm  both  with  and  without  using  the  known  completion  times.  In  all  the  tables  to  be 
presented,  the  first  three  columns  define  the  case  by  giving  the  true  final  cost,  true  program 
completion  time  and  the  noise  level  used  to  generate  the  data.  The  next  sets  of  columns  show 
results  based  on  increasing  amounts  of  data  used  in  the  estimates.  For  example,  the  column  with 
‘Time  of  Estimate”  of  3  indicates  that  3  years  of  quarterly  data  were  used  to  calculate  the 
statistics  in  that  column.  We  define  errors  as  the  estimated  value  minus  the  true  value.  The  top 
halves  of  the  tables  are  results  based  on  estimated  completion  times,  and  the  bottom  halves 
present  the  results  when  the  program  completion  time  is  known,  in  essence  estimated  perfectly. 

The  first  measure  of  effectiveness  is  the  accuracy  of  the  probabilistic  mean  in  estimating 
the  true  cost  used  to  generate  the  data.  We  calculated  the  probabilistic  mean  with  (15)  and 
adjusted  to  final  cost  with  (3).  Table  4  shows  the  statistics  for  the  seven  cases.  For  a  12  year 
program  with  unknown  completion  time,  the  results  with  3  years  of  data  have  large  errors  and 
corresponding  large  standard  deviations.  This  is  a  result  of  the  statistical  indeterminacy  between 
the  cost  scale  parameter  and  the  Rayleigh  shape  parameter.  If  the  final  time  of  the  program  is 
known,  the  errors  in  the  final  costs  are  very  small  as  seen  in  the  bottom  half  of  Table  4.  The 
errors  with  unknown  completion  times  are  conservative  in  that  they  estimate  the  program  to  be 
much  higher  in  cost  and  longer  than  it  actually  was.  With  data  that  encompass  half  the  actual 
completion  time,  the  errors  become  very  small  in  comparison  with  the  final  cost  with  relatively 
small  variance. 

The  first  three  cases  show  the  linear  effect  for  changes  in  the  true  final  cost.  Since  we 
used  the  same  seed  in  the  random  number  generator,  the  error  statistics  are  exactly  proportional 
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Case 

Average  Errors 

Error  Standard  Deviations 

Final 

Cost 

Final 

Time 

Noise 

Level 

Time  of  Estimate 

3  6  9 

12 

Time  of  Estimate 

3  6  9 

12 

Estimated  Final  Cost  and  Estimated  Final  Time 

2,000 

12 

0.1 

434.0 

5.7 

1.3 

0.3 

562.6 

13.1 

2.0 

0.3 

1,500 

12 

0.1 

325.6 

4.3 

0.9 

0.2 

422.1 

9.8 

1.5 

0.2 

1,000 

12 

0.1 

217.1 

2.9 

0.6 

0.1 

281.4 

6.6 

1.0 

0.2 

1,000 

12 

0.2 

208.9 

3.2 

0.0 

0.7 

302.6 

18.9 

3.0 

0.6 

1,000 

12 

0.3 

161.9 

8.2 

0.1 

1.0 

312.7 

44.2 

5.3 

0.9 

1,000 

9 

0.1 

70.1 

3.9 

0.2 

186.6 

9.9 

0.3 

1,000 

6 

0.1 

11.8 

1.8 

31.8 

0.5 

Estimated  Final  Cost  with  Given  Final  Time 

2,000 

12 

0.1 

-0.3 

-0.1 

0.0 

0.1 

20.7 

4.5 

1.2 

0.2 

1,500 

12 

0.1 

-0.2 

-0.1 

0.0 

0.1 

15.5 

3.4 

0.9 

0.2 

1,000 

12 

0.1 

-0.2 

-0.1 

-0.1 

0.0 

10.4 

2.3 

0.6 

0.1 

1,000 

12 

0.2 

0.4 

0.0 

0.2 

0.4 

20.3 

5.0 

1.4 

0.4 

1,000 

12 

0.3 

3.3 

-0.1 

-0.1 

0.5 

31.0 

7.6 

1.9 

0.5 

1,000 

9 

0.1 

-1.1 

0.2 

0.1 

9.0 

2.2 

0.2 

1,000 

6 

0.1 

-8.2 

0.3 

6.1 

0.3 

to  the  true  final  cost.  The  third  through  fifth  cases  show  that  as  the  noise  levels  increase  so  do 
the  estimate  standard  deviations.  The  errors  for  the  shorter  programs  in  the  last  two  cases  are 
less  because  proportionately  more  program  data  was  used  for  the  estimates.  In  all  cases,  the  error 
statistics  improve  with  additional  data,  and  the  errors  are  very  small  when  the  completion  time 
was  known.  We  did  not  include  the  results  for  the  median  because  of  their  similarity. 

The  second  statistics  depict  the  effectiveness  in  quantifying  the  cost  risk  of  continuing  the 


program.  The  cost  risk  is  depicted  with  the  cumulative  cost  curves  generated  with  (16).  We 
evaluated  the  curves  by  collecting  the  frequency  with  which  the  true  cost  was  less  than  the 
predicted  30th  and  70th  percentiles.  Table  5  shows  the  statistics  for  500  runs  for  each  case. 
When  the  reported  frequency  for  the  30th  percentile  exceeds  0.30,  the  curve  estimates  were  too 
high.  Following  the  trend  of  the  mean  and  median,  the  30th  percentile  was  high  initially  and 
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Tables.  Estimated  Percentile  Efficiencies 


1  Case 

Frequency  <  30th  Percentile 

Frequency  <  70th  Percentile 

Final 

Cost 

Final 

Time 

Noise 

Level 

Time  of  Estimate 
3  6  9 

12 

Time  of  Estimate 
3  6  9 

12 

Estimated  Final  Cost  and  Estimated  Final  Time 

2,000 

12 

0.1 

0.54 

0.38 

0.26 

1.00 

0.75 

0.96 

0.98 

1.00 

1,500 

12 

0.1 

0.54 

0.38 

0.26 

1.00 

0.75 

0.96 

0.98 

1.00 

1,000 

12 

0.1 

0.54 

0.38 

0.26 

1.00 

0.75 

0.96 

0.98 

1.00 

1,000 

12 

0.2 

0.57 

0.31 

0.10 

1.00 

0.79 

0.95 

0.83 

1.00 

1,000 

12 

0.3 

0.50 

0.37 

0.13 

1.00 

0.75 

0.82 

0.85 

1.00 

1,000 

9 

0.1 

0.26 

0.48 

1.00 

0.71 

0.87 

1.00 

1,000 

6 

0.1 

0.47 

1.00 

0.90 

1.00 

Estimated  Final  Cost 

with  Given  Final  Time 

2,000 

12 

0.1 

0.23 

0.17 

0.11 

1.00 

0.77 

0.82 

0.92 

1.00 

1,500 

12 

0.1 

0.23 

0.17 

0.11 

1.00 

0.77 

0.82 

0.92 

1.00 

1,000 

12 

0.1 

0.23 

0.17 

0.11 

1.00 

0.77 

0.82 

0.92 

1.00 

1,000 

12 

0.2 

0.29 

0.22 

0.17 

1.00 

0.69 

0.79 

0.85 

1.00 

1,000 

12 

0.3 

0.32 

0.21 

0.10 

1.00 

0.72 

0.78 

0.86 

1.00 

1,000 

9 

0.1 

0.43 

0.15 

0.98 

0.47 

0.88 

1.00 

1,000 

6 

0.1 

0.10 

0.99 

0.10 

1.00 

Note:  The  theoretical  standard  deviation  of  these  frequencies  is  0.0205. 


decreased  as  the  amount  of  data  increased.  When  all  the  data  was  used,  the  entire  cumulative 


cost  curve  exceeds  the  value  of  the  last  data  point,  which  was  the  true  final  cost,  because  the 


cumulative  cost  projects  always  exceed  reported  incurred  costs.  When  the  final  time  was  known, 
the  30th  percentiles  were  slightly  low  and  the  70th  percentiles  were  slightly  high. 

The  final  measure  of  effectiveness  is  the  width  of  the  40  percent  confidence  interval  that 
could  be  formed  from  the  30th  to  the  70th  percentiles.  The  confidence  interval  widths  indicate 


the  accuracy  the  algorithm  assigns  to  mean  estimates  in  Table  4.  Table  5  shows  that  these 
assigned  accuracies  are  commensurate  with  their  true  accuracies.  Table  6  shows  that  as  the 
additional  data  was  used  in  the  algorithm  the  confidence  interval  widths  become  very  small. 
Using  all  the  data,  the  point  estimator  error  was  less  than  0.2  percent  of  the  true  final  cost,  and 
the  corresponding  40  percent  confidence  was  less  than  0.4  percent  of  the  true  final  cost. 
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Table  6,  Fsfimated  40  Percent  Confidence  Interval  Width 


Case 

Confidence  Interval  Width 
(Distance  Between  30th  and  70th  Percentiles) 

Final 

Final 

Noise 

Time  of  Estimate 

Cost 

Time 

Level 

3 

6 

9 

12 

Estimated  Final  Cost  and  Estimated  Final  Time 

2,000 

12 

0.1 

343.5 

22.9 

7.1 

1.0 

1,500 

12 

0.1 

257.4 

17.2 

5.4 

0.8 

1,000 

12 

0.1 

171.6 

11.5 

3.6 

0.5 

1,000 

12 

0.2 

197.7 

20.9 

7.8 

1.2 

1,000 

12 

0.3 

226.4 

48.2 

10.9 

1.6 

1,000 

9 

0.1 

197.3 

16.5 

1.0 

1,000 

6 

0.1 

33.9 

3.4 

Estimated  Final  Cost  with  Given  Final  Time 

mxm 

12 

0.1 

25.1 

7.7 

4.0 

0.7 

12 

0.1 

18.8 

5.8 

3.0 

0.8 

1,000 

12 

0.1 

12.6 

3.8 

2.0 

0.4 

1,000 

12 

0.2 

20.7 

7.6 

3.5 

1.0 

1,000 

12 

0.3 

29.1 

11.3 

5.4 

1.2 

1,000 

9 

0.1 

1.5 

2.8 

1.0 

1,000 

6 

0.1 

0.0 

2.5 

9.  Summary 

We  developed  and  tested  a  method  of  estimating  the  probability  of  final  cost  and 
completion  time  for  R&D  programs  conditioned  on  actual  cost  reports.  The  method  is  based  on 
assuming  that  the  cumulative  earned  value  (represented  by  constant-dollar  expenditures)  of  the 
development  program  followed  a  Rayleigh  distribution.  The  approach  uses  Multiple  Model 
Adaptive  Estimation  (MMAE),  which  employs  a  large  number  of  Kalman  filters,  to  estimate  the 
Rayleigh  model  parameters.  The  MMAE  technique,  as  applied  in  this  application,  provides  the 
probabilities  of  various  final  cost  estimates  and  projected  completion  times  conditioned  on  actual 
cost  data.  We  summed  those  probabilities  to  produce  final  cost  likelihood  curves.  These  curves 
depict  the  likelihood  the  final  cost  estimate  will  be  below  various  cost  estimates.  The  final  cost 
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estimates,  and  likelihood  curves  can  be  converted  from  constant  dollars  to  current  dollars. 
Similarly,  likelihood  curves  for  completion  time  can  be  constructed. 
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